Designing speech database with prosodic variety for expressive TTS system
نویسندگان
چکیده
For the purpose of building speech synthesis system that can generate high-quality speech with wide range in prosody and realize fine prosody control, we propose new speech database constructing method. As a speech synthesis method, we select a hybrid system which consists of two part : speech unit selection and prosody modification part by STRAIGHT (vocoder type high quality analysis-synthesis method). Our viewpoint for designing database is to reduce amount of prosody modification. which causes quality deterioration. Hence, to make it possible to generate arbitrary prosody within permissible range of prosody modification, we designed 9 sub-databases those consist of same phonetic balanced text set with different prosody. In this paper, we report the designing method and general features of obtained databases. Listening tests focused on durational fearure were also conducted. The results show effectiveness of the method and the necessity to change unit selection cost according to speech rate.
منابع مشابه
طراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کاملAdapting Prosody in a Text-to-Speech System
The requirements of the evolving information communication technologies (ICT) place new demands on text-to-speech (TTS) systems. The modern high quality TTS system has to be capable of fast and high-quality adaptation to a new language, voice or even expressive speech. Thus adaptation to new voices with different prosodic characteristics is desired. In this chapter a survey of recent and past a...
متن کاملDesigning prosodic databases for automatic modelling in 6 languages
We describe the design and creation of prosodic speech databases for 6 languages. The purpose of the databases is to allow derivation of prosody models in order to improve TTS synthesis. The main prosodic variables to model were word prominence, prosodic boundary strength and phone duration. We describe the database structure and contents and the methodology for creating prosodic databases, and...
متن کاملDialog speech acts and prosody: Considerations for TTS
As natural language dialog systems involving both speech recognition and text-to-speech (TTS) synthesis become more sophisticated, the limitations of general-purpose TTS for human-computer dialogs have become more apparent. Much subtlety and complexity of meaning in natural language dialogs is conveyed by prosody; how something is said is often as important as what words are spoken. At the same...
متن کاملCalliphony: a real-time intonation controller for expressive speech synthesis
Intonation synthesis using a hand-controlled interface is a new approach for effective synthesis of expressive prosody. A system for prosodic real time modification is described. The user is controlling prosody in real time by drawing contours on a graphic tablet while listening to the modified speech. This system, a pen controlled speech instrument, can be applied to text to speech synthesis a...
متن کامل